English-Latvian SMT: knowledge or data?

نویسندگان

  • Inguna Skadina
  • Edgars Bralitis
چکیده

In cases when phrase-based statistical machine translation (SMT) is applied to languages with rather free word order and rich morphology, translated texts often are not fluent due to misused inflectional forms and wrong word order between phrases or even inside the phrase. One of possible solutions how to improve translation quality is to apply factored models. The paper presents work on English-Latvian phrase-based and factored SMT systems and, using evaluation results, demonstrates that although factored models seem more appropriate for highly inflected languages, they have rather small influence on translation results, while using phrase-model with more data better translation quality could be achieved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving SMT with Morphology Knowledge for Baltic Languages

In the recent years, several machine translation systems have been built for the Baltic languages. Besides Google and Microsoft machine translation engines and research experiments with statistical MT for Latvian [1] and Lithuanian, there are both English-Latvian [2] and English-Lithuanian [3] rulebased MT systems available. Both Latvian and Lithuanian are morphologically rich languages with qu...

متن کامل

NMT or SMT: Case Study of a Narrow-domain English-Latvian Post-editing Project

The recent technological shift in machine translation from statistical machine translation (SMT) to neural machine translation (NMT) raises the question of the strengths and weaknesses of NMT. In this paper, we present an analysis of NMT and SMT systems’ outputs from narrow domain English-Latvian MT systems that were trained on a rather small amount of data. We analyze post-edits produced by pr...

متن کامل

English-latvian SMT: the challenge of translating into a free word order language

This paper presents a comparative study of two approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation, which is still an open research line in the field of automatic translation. We consider a state-of-the-art phrase-based SMT and an alternative N -gram-based SMT systems. The major differences between these two approaches lie in the...

متن کامل

Towards Improving English-Latvian Translation: A System Comparison and a New Rescoring Feature

This paper presents a comparative study of two alternative approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation. Furthermore, a novel feature intending to reflect the relatively free word order scheme of the Latvian language is proposed and successfully applied on the n-best list rescoring step. Moving beyond classical automatic s...

متن کامل

Improving SMT for Baltic Languages with Factored Models

This paper reports on implementation and evaluation of English-Latvian and Lithuanian-English statistical machine translation systems. It also gives brief introduction of project scope – Baltic languages, prior implementations of MT and evaluation of MT systems. In this paper we report on results of both automatic and human evaluation. Results of human evaluation show that factored SMT gives si...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009